FLEX: Unifying Evaluation for Few-Shot NLP
https://arxiv.org/abs/2107.07170
we release the FLEX benchmark, which includes four few-shot transfer settings, zero-shot evaluation, and a public leaderboard that covers diverse NLP tasks
https://github.com/allenai/flex
https://youtu.be/2CeuNW8lIZo?si=pvTaZ7KN3qFGWfIc